Variable Span disfluency detection in ASR transcripts

نویسندگان

  • Rahul Gupta
  • Sankaranarayanan Ananthakrishnan
  • Zhaojun Yang
  • Shrikanth S. Narayanan
چکیده

Natural conversations often involve disfluencies in the form of revisions, repetitions, interjections, filled pauses and such. This paper focuses on word/phrase repetitions and revisions that are lexically well formed. These are generally captured by an ASR but pose problems to downstream processing such as spoken language translation (SLT). We describe a system to identify such word level disfluencies with a goal towards removing them in real time from the automatic recognition (ASR) system output. We use a span based training system to utilize the contextual information while tagging disfluencies. We design our system on the oracle transcripts and test them on both reference and ASR transcripts. We achieve an area under the receiver operating characteristics (ROC) curve for word level disfluency detection of .93 and .87 for the reference and the ASR transcripts respectively.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Joint, Incremental Disfluency Detection and Utterance Segmentation from Speech

We present the joint task of incremental disfluency detection and utterance segmentation and a simple deep learning system which performs it on transcripts and ASR results. We show how the constraints of the two tasks interact. Our joint-task system outperforms the equivalent individual task systems, provides competitive results and is suitable for future use in conversation agents in the psych...

متن کامل

Joint Transition-based Dependency Parsing and Disfluency Detection for Automatic Speech Recognition Texts

Joint dependency parsing with disfluency detection is an important task in speech language processing. Recent methods show high performance for this task, although most authors make the unrealistic assumption that input texts are transcribed by human annotators. In real-world applications, the input text is typically the output of an automatic speech recognition (ASR) system, which implies that...

متن کامل

Tight Integration of Speech Disfluency Removal into SMT

Speech disfluencies are one of the main challenges of spoken language processing. Conventional disfluency detection systems deploy a hard decision, which can have a negative influence on subsequent applications such as machine translation. In this paper we suggest a novel approach in which disfluency detection is integrated into the translation process. We train a CRF model to obtain a disfluen...

متن کامل

Efficient Disfluency Detection with Transition-based Parsing

Automatic speech recognition (ASR) outputs often contain various disfluencies. It is necessary to remove these disfluencies before processing downstream tasks. In this paper, an efficient disfluency detection approach based on right-to-left transitionbased parsing is proposed, which can efficiently identify disfluencies and keep ASR outputs grammatical. Our method exploits a global view to capt...

متن کامل

Variable-Span out-of-vocabulary named entity detection

Out-of-vocabulary named entities (OOV NEs) are always misrecognized by fixed-vocabulary automatic speech recognition (ASR) systems. This has a negative impact on downstream applications such as language understanding and machine translation (MT). Automatic detection of OOV NEs in ASR hypotheses can help mitigate this problem by triggering the use of alternative approaches to acquire and process...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014